AITopics | generalist model

Collaborating Authors

generalist model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks Wenhai Wang 2 Zhe Chen 1,3 Xiaokang Chen 1,4 Jiannan Wu

Neural Information Processing SystemsFeb-16-2026, 22:29:37 GMT

It's noteworthy that, with a generalist LLMbased framework, our model can achieve over 60% mAP on COCO, on par with

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

11fc8c98b46d4cbdfe8157267228f7d7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 13:05:54 GMT

arxiv preprint arxiv, conditional moe, generalist model, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks

Vishwanath, Krithik, Ghosh, Mrigayu, Alyakin, Anton, Alber, Daniel Alexander, Aphinyanaphongs, Yindalon, Oermann, Eric Karl

arXiv.org Artificial IntelligenceDec-2-2025

Specialized clinical AI assistants are rapidly entering medical practice, often framed as safer or more reliable than general-purpose large language models (LLMs). Yet, unlike frontier models, these clinical tools are rarely subjected to independent, quantitative evaluation, creating a critical evidence gap despite their growing influence on diagnosis, triage, and guideline interpretation. We assessed two widely deployed clinical AI systems (OpenEvidence and UpToDate Expert AI) against three state-of-the-art generalist LLMs (GPT-5, Gemini 3 Pro, and Claude Sonnet 4.5) using a 1,000-item mini-benchmark combining MedQA (medical knowledge) and HealthBench (clinician-alignment) tasks. Generalist models consistently outperformed clinical tools, with GPT-5 achieving the highest scores, while OpenEvidence and UpToDate demonstrated deficits in completeness, communication quality, context awareness, and systems-based safety reasoning. These findings reveal that tools marketed for clinical decision support may often lag behind frontier LLMs, underscoring the urgent need for transparent, independent evaluation before deployment in patient-facing workflows.

generalist model, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.01191

Country:

North America > United States > New York (0.23)
North America > United States > Texas > Travis County > Austin (0.16)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.77)

Industry:

Health & Medicine > Health Care Providers & Services (0.96)
Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Single Tensor Cell Segmentation using Scalar Field Representations

Vargas, Kevin I. Ruiz, Galdino, Gabriel G., Ren, Tsang Ing, Cunha, Alexandre L.

arXiv.org Artificial IntelligenceNov-19-2025

We investigate image segmentation of cells under the lens of scalar fields. Our goal is to learn a continuous scalar field on image domains such that its segmentation produces robust instances for cells present in images. This field is a function parameterized by the trained network, and its segmentation is realized by the watershed method. The fields we experiment with are solutions to the Poisson partial differential equation and a diffusion mimicking the steady-state solution of the heat equation. These solutions are obtained by minimizing just the field residuals, no regularization is needed, providing a robust regression capable of diminishing the adverse impacts of outliers in the training data and allowing for sharp cell boundaries. A single tensor is all that is needed to train a \unet\ thus simplifying implementation, lowering training and inference times, hence reducing energy consumption, and requiring a small memory footprint, all attractive features in edge computing. We present competitive results on public datasets from the literature and show that our novel, simple yet geometrically insightful approach can achieve excellent cell segmentation results.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2511.13947

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.96)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Generalist Foundation Models Are Not Clinical Enough for Hospital Operations

Jiang, Lavender Y., Chen, Angelica, Han, Xu, Liu, Xujin Chris, Dua, Radhika, Eaton, Kevin, Wolff, Frederick, Steele, Robert, Zhang, Jeff, Alyakin, Anton, Pan, Qingkai, Chen, Yanbing, Sangwon, Karl L., Alber, Daniel A., Stryker, Jaden, Lee, Jin Vivian, Aphinyanaphongs, Yindalon, Cho, Kyunghyun, Oermann, Eric Karl

arXiv.org Artificial IntelligenceNov-18-2025

Hospitals and healthcare systems rely on operational decisions that determine patient flow, cost, and quality of care. Despite strong performance on medical knowledge and conversational benchmarks, foundation models trained on general text may lack the specialized knowledge required for these operational decisions. We introduce Lang1, a family of models (100M-7B parameters) pretrained on a specialized corpus blending 80B clinical tokens from NYU Langone Health's EHRs and 627B tokens from the internet. To rigorously evaluate Lang1 in real-world settings, we developed the REalistic Medical Evaluation (ReMedE), a benchmark derived from 668,331 EHR notes that evaluates five critical tasks: 30-day readmission prediction, 30-day mortality prediction, length of stay, comorbidity coding, and predicting insurance claims denial. In zero-shot settings, both general-purpose and specialized models underperform on four of five tasks (36.6%-71.7% AUROC), with mortality prediction being an exception. After finetuning, Lang1-1B outperforms finetuned generalist models up to 70x larger and zero-shot models up to 671x larger, improving AUROC by 3.64%-6.75% and 1.66%-23.66% respectively. We also observed cross-task scaling with joint finetuning on multiple tasks leading to improvement on other tasks. Lang1-1B effectively transfers to out-of-distribution settings, including other clinical tasks and an external health system. Our findings suggest that predictive capabilities for hospital operations require explicit supervised finetuning, and that this finetuning process is made more efficient by in-domain pretraining on EHR. Our findings support the emerging view that specialized LLMs can compete with generalist models in specialized tasks, and show that effective healthcare systems AI requires the combination of in-domain pretraining, supervised finetuning, and real-world evaluation beyond proxy benchmarks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.13703

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.47)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks Wenhai Wang 2 Zhe Chen 1,3 Xiaokang Chen 1,4 Jiannan Wu

Neural Information Processing SystemsOct-9-2025, 06:37:42 GMT

It's noteworthy that, with a generalist LLMbased framework, our model can achieve over 60% mAP on COCO, on par with

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting

Kayaalp, Mert, Turkmen, Caner, Shchur, Oleksandr, Mercado, Pedro, Ansari, Abdul Fatir, Bohlke-Schneider, Michael, Wang, Bernie

arXiv.org Artificial IntelligenceOct-9-2025

Is bigger always better for time series foundation models? With the question in mind, we explore an alternative to training a single, large monolithic model: building a portfolio of smaller, pretrained forecasting models. By applying ensembling or model selection over these portfolios, we achieve competitive performance on large-scale benchmarks using much fewer parameters. We explore strategies for designing such portfolios and find that collections of specialist models consistently outperform portfolios of independently trained generalists. Remarkably, we demonstrate that post-training a base model is a compute-effective approach for creating sufficiently diverse specialists, and provide evidences that ensembling and model selection are more compute-efficient than test-time fine-tuning.

data mining, machine learning, portfolio, (19 more...)

arXiv.org Artificial Intelligence

2510.06419

Country: North America (0.46)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)

Industry: Energy (0.67)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents

Yang, Qinglong, Li, Haoming, Zhao, Haotian, Yan, Xiaokai, Ding, Jingtao, Xu, Fengli, Li, Yong

arXiv.org Artificial IntelligenceJul-30-2025

Mobile GUI agents are becoming critical tools for enhancing human-device interaction efficiency, with multimodal large language models (MLLMs) emerging as dominant paradigms in this domain. Current agents, however, are limited to following explicit human instructions, resulting in insufficient capability for proactive intent anticipation. Additionally, these agents fail to leverage the contextual information associated with users during task execution, thereby neglecting potentially vast differences in user preferences. To address these challenges, we introduce the FingerTip benchmark. It contains two new tracks: proactive task suggestions by analyzing environment observation and users' previous intents, and personalized task execution by catering to users' action preferences. We collected unique human demonstrations of multi-step Android device interactions across a variety of everyday apps. These demonstrations are not isolated but are continuously acquired from the users' long-term usage in their real lives, and encompass essential user-related contextual information. Our experiments reveal challenges of the tasks we propose. The model fine-tuned with the data we collected effectively utilized user information and achieved good results, highlighting the potential of our approach in building more user-oriented mobile GUI agents. Our code is open-source at https://anonymous.4open.science/r/FingerTip-57B8 for reproducibility.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2507.21071

Genre: Research Report (0.50)

Industry:

Information Technology (0.54)
Telecommunications (0.34)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Compositional Understanding in Signaling Games

Freeborn, David Peter Wallis

arXiv.org Artificial IntelligenceJul-22-2025

Even when the signalers send compositional messages, the receivers do not interpret them compositionally. When information from one message component is lost or forgotten, the information from other components is also erased. In this paper I construct signaling game models in which genuine compositional understanding evolves. I present two new models: a minimalist receiver who only learns from the atomic messages of a signal, and a generalist receiver who learns from all of the available information. These models are in many ways simpler than previous alternatives, and allow the receivers to learn from the atomic components of messages.

artificial intelligence, information, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.15706

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback